Fast Content-based Visual Mapping for Interactive Exploration of Document Collections

نویسندگان

  • Rosane Minghim
  • Fernando Vieira Paulovich
  • Alneu de Andrade Lopes
چکیده

This paper presents a fast technique for map generation of document collections that, besides being able to group (and separate) documents by their contents, runs at very manageable computational costs, generating maps of pre-processed text in a matter of seconds. Based on multi-dimensional projection techniques and an algorithm for projection improvement, it results in a surface map that allows the user to identify a number of important relationships between documents and groups of documents that are reflected as visual attributes such as height, color, isolines as well as aural attributes (such as pitch). The map is interactive, allowing further exploration and narrowing of focus on a search task. The technique, named IDMAP (Interactive Document Map), is fully described in this paper. The results are bound to support a large number of applications that rely on retrieval and examination of document collections.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WEBSOM - Self-organizing maps of document collections

Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable eeorts have been made to develop alternative methods, but their practical applicability has been low. Powerful methods are needed for the exploration of miscellaneous document collections. The WEB...

متن کامل

Combining Visualization and Interactive Clustering for Exploring Large Document Pools

In this work we present a visual method for exploration of huge document pools. Our method is based on topological visualization with Self-organizing map and document clustering to categories, combining advantages of both concepts. Proposed method is interactive, the user can explore document pool with two different interaction modalities from the perspective he is interested in and with differ...

متن کامل

Large Image Collections - Comprehension and Familiarization by Interactive Visual Analysis

Large size and complex multi-dimensional characteristics of image collections demand a multifaceted approach to exploration and analysis providing better comprehension and appreciation. We explore large and complex data-sets composed of images and parameters describing the images. We describe a novel approach providing new and exciting opportunities for the exploration and understanding of such...

متن کامل

TopicViz: Semantic Navigation of Document Collections

When people explore and manage information, they think in terms of topics and themes. However, the software that supports information exploration sees text at only the surface level. In this paper we show how topic modeling – a technique for identifying latent themes across large collections of documents – can support semantic exploration. We present TopicViz, an interactive environment for inf...

متن کامل

Text Mining with the WEBSOM

The emerging eld of text mining applies methods from data mining and exploratory data analysis to analyzing text collections and to conveying information to the user in an intuitive manner. Visual, map-like displays provide a powerful and fast medium for portraying information about large collections of text. Relationships between text items and collections, such as similarity, clusters, gaps a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005